A Genetic Semi-supervised Fuzzy Clustering Approach to Text Classification

نویسندگان

  • Hong Liu
  • Shang-Teng Huang
چکیده

A genetic semi-supervised fuzzy clustering algorithm is proposed, which can learn text classifier from labeled and unlabeled documents. Labeled documents are used to guide the evolution process of each chromosome, which is fuzzy partition on unlabeled documents. The fitness of each chromosome is evaluated with a combination of fuzzy within cluster variance of unlabeled documents and misclassification error of labeled documents. The structure of the clusters obtained can be used to classify future new documents. Experimental results show that the proposed approach can improve text classi-fication accuracy significantly, compared to text classifiers trained with a small number of labeled documents only. Also, this approach performs at least as well as the similar approach – EM with Näıve Bayes

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Categorization using the Semi-Supervised Fuzzy c-Means Algorithm

Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. For the past few years, TC has become very important essentially in the Information Retrieval area, where information needs have tremendously increased with the rapid growth of textual information sources such as the Internet. In this paper, we compare , for text categoriz...

متن کامل

Semi-supervised Kernel-Based Fuzzy C-Means

This paper presents a semi-supervised kernel-based fuzzy c-means algorithm called S2KFCM by introducing semi-supervised learning technique and the kernel method simultaneously into conventional fuzzy clustering algorithm. Through using labeled and unlabeled data together, S2KFCM can be applied to both clustering and classification tasks. However, only the latter is concerned in this paper. Expe...

متن کامل

A Semi - supervised Text Clustering Algorithm Based on Pairwise Constraints ★

In this paper, an active learning method which can effectively select pairwise constraints during clustering procedure was presented. A novel semi-supervised text clustering algorithm was proposed, which employed an effective pairwise constraints selection method. As the samples on the fuzzy boundary are far away from the cluster center in the clustering procedure, they can be easily divided in...

متن کامل

Fuzzy semi-supervised co-clustering for text documents

In this paper we propose a new heuristic semi-supervised fuzzy co-clustering algorithm (SS-HFCR) for categorization of large web documents. In this approach, the clustering process is carried out by incorporating some prior knowledge in the form of pair-wise constraints provided by users into the fuzzy co-clustering framework. Each constraint specifies whether a pair of documents “must” or “can...

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003